#cross-modal alignment29/04/2025
UniME: Advancing Multimodal Representations with a Two-Stage MLLM Framework
UniME introduces a two-stage framework that significantly improves multimodal representation learning by leveraging textual knowledge distillation and hard negative instruction tuning, outperforming existing models on multiple benchmarks.